%% This is file `cag-template.tex',
%%
%% Copyright 2018 Elsevier Ltd
%%
%% This file is part of the 'Elsarticle Bundle'.
%% ---------------------------------------------
%%
%% It may be distributed under the conditions of the LaTeX Project Public
%% License, either version 1.2 of this license or (at your option) any
%% later version.  The latest version of this license is in
%%    http://www.latex-project.org/lppl.txt
%% and version 1.2 or later is part of all distributions of LaTeX
%% version 1999/12/01 or later.
%%
%% The list of all files belonging to the 'Elsarticle Bundle' is
%% given in the file `manifest.txt'.
%%
%% Template article for Elsevier's document class `elsarticle'
%% with harvard style bibliographic references
%%
%% $Id: cag-template.tex 151 2018-11-22 04:42:39Z rishi $
%%
%% Use the options `twocolumn,final' to obtain the final layout
%% Use `longtitle' option to break abstract to multiple pages if overfull.
%% For Review pdf (With double line spacing)
%\documentclass[times,twocolumn,review]{elsarticle}
%% For abstracts longer than one page.
%\documentclass[times,twocolumn,review,longtitle]{elsarticle}
%% For Review pdf without preprint line
%\documentclass[times,twocolumn,review,nopreprintline]{elsarticle}
%% Final pdf
%\documentclass[times,twocolumn,final]{elsarticle}
%%
\documentclass[times,twocolumn,final]{elsarticle}
%%


%% Stylefile to load CAG template
\usepackage{cag}
\usepackage{framed,multirow}
\usepackage{bm}
\usepackage{amsmath}

%% Algorithm packages
\usepackage[linesnumbered,ruled,vlined]{algorithm2e}

%% Figure
\usepackage{overpic}
\usepackage{framed}
\usepackage{color}
\usepackage{subfigure}
\usepackage[export]{adjustbox}

%% The amssymb package provides various useful mathematical symbols
\usepackage{amssymb}
\usepackage{latexsym}

% Following three lines are needed for this document.
% If you are not loading colors or url, then these are
% not required.
\usepackage{url}
\usepackage{xcolor}
\definecolor{newcolor}{rgb}{.8,.349,.1}

\usepackage{hyperref}

\usepackage[switch,pagewise]{lineno} %Required by command \linenumbers below
\usepackage{ReviewCommand}

\journal{Computers \& Graphics}

\begin{document}

\verso{Preprint Submitted for review}

\begin{frontmatter}

\title{Foveated Light Culling}%

%\received{1 February 2017}
\received{\today}
%%%% Do not use the below for submitted manuscripts
%\finalform{28 March 2017}
%\accepted{2 April 2017}
%\availableonline{15 May 2017}
%\communicated{S. Sarkar}


\begin{abstract}
%%%
In this paper, we propose a novel \textit{Foveated Light Culling} (FLC) method to efficiently approximate global illumination for foveated rendering in virtual reality applications. The key idea is to cull the virtual point lights (VPLs) in accordance with their contribution to the visual perception, which makes the loss of details caused is imperceptible to the human eyes. Based on this idea and the features of human visual system (HVS), we calculate the contribution of each VPL according to samples generated from a perceptual importance map. By estimating the local visibility between VPLs and samples, our method can greatly improve the performance during the process of evaluating contributions. Our method is able to produce high-quality single-bounce diffuse-to-diffuse indirect illumination in the foveal region at real-time frame rate while supporting dynamic scenes and light sources. Furthermore, the experiments show that our approach makes a good balance between foveal quality and render efficiency.
%%%%
\end{abstract}

\begin{keyword}
%% MSC codes here, in the form: \MSC code \sep code
%% or \MSC[2008] code \sep code (2000 is the default)
%\MSC 41A05\sep 41A10\sep 65D05\sep 65D17
%% Keywords
\KWD Virtual Reality\sep Foveated Rendering\sep Global Illumination
\end{keyword}

\end{frontmatter}

\linenumbers

%% main text
\section{Introduction}
Head mounted displays (HMDs) are one of the most popular display devices for virtual reality (VR) or augmented reality (AR) applications. Different from rendering images on normal displays, generating images on HMDs has more chances to exploit the features of HVS to improve the rendering efficiency/quality. For instance, even though HVS has a very wide field of view (160$^{\circ}$ horizontally and 135$^{\circ}$ vertically), it just can resolve the fine details within a 5$^{\circ}$ central circle \cite{guenter2012foveated}. The visual acuity of HVS decreases radially between the gaze point (the fovea) and the periphery. Equipped with an eye tracker, foveated rendering systems \cite{cheng2003foveated, weier2017perception, he2014extending} significantly improve the rendering performance by decreasing shading quality in the peripheral regions while maintaining high fidelity in the foveal area.

Meanwhile, with the increasing\DEL{ly} demand for photorealistic rendering, global illumination becomes the major trend for almost all graphics application, including generating images on HMDs. Many methods have been proposed to combine global illumination with foveated rendering. For example, \citet{wang2020foveated} has proposed a model to adapt instant radiosity into foveated rendering, which generates VPLs from the camera to provide indirect illumination to the foveal region. \citet{koskela2019foveated} render\DEL{s} scenes in visual-polar space, thereby reducing the amount of computation\DEL{s} for path tracing. However, finding a good balance between the rendering quality and performance for foveated rendering is still an open problem.

In this paper, we propose a method based on \textit{Instant Radiosity} \cite{keller1997instant} to efficiently produce indirect illumination for foveal region. \ADD{Although \cite{wang2020foveated} is a similar idea, its results are poor in many scenes due to two obvious problems: The first is that it generates VPLs from the camera, making it difficult for VPLs to guarantee their contributions to the foveal region. The second problem is that it only considers the relationship of visibility and locations between VPLs and the foveal region when calculating the importance, which results in great errors.}

\ADD{In our work, we adopt a different VPLs generation method to ensure that they have enough intensities to provide indirect illumination for the foveal region.} \REP{In}{Furthermore, in} order to make a good balance between visual details and performance, we propose an efficient method to evaluate the contributions of VPLs to the final image quality, which is majoritively determined by the foveal region. Based on these contributions, the important VPLs have higher chances to be kept to compute indirect illumination. More specifically, we propose the following two strategies to achieve a good balance between rendering performance and quality. The first strategy is an importance map based on HVS features to describe the importance of each pixel to perception of indirect illumination. And the second strategy is an efficient method to fast compute the visibilities between VPLs and shading points.

%-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\section{Related Work}\label{sec:RelatedWorks}
In this section, we provide an overview of main fields of researches related to our work.\DEL{ First of all, we introduce the prior work for foveated 3D rendering in recent years. And then, we present the previous instant radiosity rendering work, upon which our FLC method is based on. After that, we briefly discuss the topic about visual perception model which is important to HVS. At last, we also discuss light culling techniques that improve efficiency in the real-time rendering area.}

\textbf{Foveated 3D Rendering} is proposed for VR applications \cite{guenter2012foveated}, which improves rendering efficiency by reducing the image quality in the periphery. \ADD{Many researchers focus on geometric simplification.} \citet{lindeberg2016concealing} proposed a novel technique to adjust tessellation levels.  \citet{cheng2003foveated} presented enhancements to 3D model simplification based on interactive level-of-detail (LOD) update with foveation. \citet{weier2017perception} had reviewed some methods focusing on mesh simplification in the peripheral region \cite{reddy2001perceptually,hu2009parallel}.

Multi-resolution and multi-rate shading are the strategies which enable the adaption of shading complexity to the scene content. \citet{guenter2012foveated} proposed a three-pass pipeline for foveated 3D rendering by using three nested layers.  \citet{tursun2019luminance} presented a luminance contrast aware algorithm to analyze displayed luminance contrast. \citet{vaidyanathan2014coarse} proposed a method to sample coarse pixels in the peripheral region. \citet{he2014extending} introduced multi-rate GPU shading, which enables more shading samples near regions of specular highlights, shadows, edges, and motion blur regions. \ADD{Our method is also based on multi-resolution, the choosed VPLs are contributing more to the foveal region and less to the periphery.}

\textbf{Instant Radiosity} \cite{keller1997instant} is a method that approximates the indirect illumination within real-time requirements. Many researchers worked on the strategies of VPLs distribution \cite{davidovivc2010combining,segovia2006bidirectional,simon2015rich}. Some other research\DEL{es} also dedicated in estimation of VPLs' contributions to the final output image. \citet{segovia2006bidirectional} presented a coarse approach that regards visibility as the contribution.  \citet{hedman2016sequential} proposed a mail-boxing scheme to determine indirect visibility for static scenes. \citet{wang2020foveated} emitted the rays from camera to distribute VPLs. \ADD{We integrate the instant radiosity method into foveated rendering framework and distribute VPLs by stochastically culling them according to their estimated contributions to the foveal region.}

\textbf{Visual Perception Model} describes the details that can actually be perceived by the observer. \citet{weymouth1963visual} had shown an approximately linear degradation behaviour of acuity with eccentricity \REP{angular}{angle}. \citet{longhurst2006gpu} proposed a fast approach to extract salliencies for visual cues by pre-rendering a low resolution frame. \citet{stengel2016adaptive} introduced a flexible image-space sampling strategy to incorporate more visual cues in perceptual model to sparsely sample shading points. \ADD{In our method, we generate an importance map based on HVS to guide the sampling of shading points which used to estimate the contributions of VPLs.}

\textbf{Light Culling} is a technique with clamping light ranges to improve rendering efficiency in real-time applications. \citet{harada2012forward+} presented a forward+ method to support rendering many lights by storing only lights that contribute to the pixel. \citet{harada20122} also proposed a 2.5D culling for further improving the forward+ rendering pipeline. \citet{tokuyoshi2016stochastic} introduced a stochastic light culling method that randomly determines a light range to avoid undesirable bias. \citet{o2017tiled} presented a new approach called tiled light tree that adapts to the light source distribution.

\ADD{In this paper, we apply foveation-based rendering to \textit{Instant Radiosity}, which can produce plausible global illumination in VR applications. It also reaches the real-time frame rates by culling VPLs based on the visual perception of HVS. In contrast to previous work, our culling method is based on the features of HVS, not just the geometric information in the image space, which means that the rest VPLs used for shading contribute more to the foveal region.}

 \begin{figure*}[ht]
	\includegraphics[width=\linewidth]{./Figures/overview.pdf}
	\caption{Overflow of our algorithm: (1) Geometry stage: generating G-buffer and RSMs for scene and light sources respectively; (2) Importance sampling stage: creating perceptual probability map and choosing pixels as samples based on this map; (3) Light culling stage: estimating the contribution of each VPL and stochastically culling VPLs according to their contributions; (4) Rendering stage: shading with rest VPLs using accelerated algorithms.}
	\label{fig:AlgorithmOverview}
\end{figure*}

%-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\section{Algorithm Overview}\label{sec:AlgorithmOverview}
In this paper, we try to generate high quality single-bounce indirect illumination in the foveal region $\mathbb{F}$ at the cost of image quality in peripheral region $\mathbb{P}$ by instant radiosity method. The basic idea is to stochastically cull the VPLs according to their contributions to the final image quality, which is majoritively determined by the image quality of $\mathbb{F}$.

Generally speaking, the contribution $C_{l_v}(\mathbb{I})$ of a VPL $l_v$ to the final image quality can be described as:

\begin{equation}
	C_{l_v}(\mathbb{I}) = w_\mathbb{F} C(l_v \rightarrow \mathbb{F}) + w_\mathbb{P} C(l_v \rightarrow \mathbb{P})
	\label{eq:AccurateContribution}
\end{equation}
where $\mathbb{I}=\mathbb{F}\bigcup\mathbb{P}$ represents an image, $w_\mathbb{F}$, and $w_\mathbb{P}$ is the weight to $\mathbb{F}$ and $\mathbb{P}$ respectively. The contribution $C(l_v \rightarrow \mathbb{R})$ of VPL $l_v$ to a pixel region $\mathbb{R}$ can be computed from:

\begin{equation}
	\begin{aligned}
		C(l_v \rightarrow \mathbb{R}) &= \frac{1}{N} \sum_{i=1}^{N} f(l_v \rightarrow p_i)\\
		                                   &= \frac{1}{N} \sum_{i=1}^{N} f(I_{l_v}, V_i, G_i, \bm{\mathfrak{p}}_i)
	\end{aligned}
	\label{eq:PixelContribution}
\end{equation}
where $f$ represents the VPL's contribution to pixel $p_i \in \mathbb{R}$, ${N}$ is the number of pixels in $\mathbb{R}$, and $I_{l_v}$ is the intensity of VPL $l_v$. $\bm{\mathfrak{p}}_i, V_i$ and $ G_i$ represent the weight of pixel $p_i$, the visibility and geometry term between \REP{VPL}{$l_v$} and pixel $p_i$ respectively.

However, it is impractical to directly employ Eq.\ref{eq:AccurateContribution} and Eq.\ref{eq:PixelContribution} to evaluate the contribution of $l_v$ to image $\mathbb{I}$ because of the high computation cost. For instance, it involves all pixels on $\mathbb{I}$, the term $V(l_v, p_i)$ needs to solve visibilities between all VPLs and shading points $p_i$ and so on.

In this paper, we propose a novel method called \textit{Foveated Light Culling} to efficiently approximate $C_{l_v}(\mathbb{I})$. Based on $C_{l_v}(\mathbb{I})$, VPLs are stochastically culled and the remained VPLs are employed to generate the indirect illumination. Note that $C_{l_v}(\mathbb{I})$ is majoritively determined by the foveal region $\mathbb{F}$, thus the image quality of $\mathbb{F}$ can be guaranteed.

The procedure of our algorithm is as illustrated in Fig.\ref{fig:AlgorithmOverview}. Our method focuses on solving the following two issues to achieve a good balance between the final image quality and rendering performance:

\begin{itemize}
	\item \textit{How to reduce the number of pixels involved in the computation of $C_{l_v}(\mathbb{I})$ without compromising the quality of perception?} According to the features of HVS, the following three factors are important to visual perception: visual acuity, geometry silhouette and indirect illumination. Based on these factors, we compute an importance map for image $\mathbb{I}$ to describe the importance of each pixel to perception of indirect illumination. And then, an importance sampling process is conduced according to the importance map to choose pixels as samples $p^\prime$ to participate in the computation of Eq.\ref{eq:AccurateContribution} (S-3 and S-4 in Fig.\ref{fig:AlgorithmOverview} and see Sec.\ref{sec:ImportanceSampling} for more details);
	\item \textit{How to efficiently estimate visibilities between VPLs and pixels?} Visibilities between VPLs and pixels play an important role in the computation of indirect illumination. Though the modern GPUs support hardware ray tracing to solve this problem, it is still impractical to conduct such huge number of visibility tests in real-time. In this paper, we propose a \textit{Bidirectional Local-Space Occlusion}(BLSO) method to efficiently find an approximate solution of the visibility problem by exploiting local occlusion information (see Sec.\ref{sec:StochasticLightCulling} for more details).
\end{itemize}

In addition, we need G-Buffer to extract the geometry information of each pixel and RSMs to generate VPLs (S-1 and S-2 in Fig.\ref{fig:AlgorithmOverview}). After culling the low-contribution VPLs, we use the rest to render the scenes with interleaved sampling and log-polar transformation (S-7 in Fig.\ref{fig:AlgorithmOverview} and see Sec.\ref{sec:Implementation} for more details).

%-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\section{Sampling of Shading Points}\label{sec:ImportanceSampling}
In this section, we present our method to reduce the number of pixels involved in the computation of $C_{l_v}$. The basic idea is to compute an importance map to describe the importance of each pixel $p_i$ in the evaluation of $C_{l_v}(\mathbb{I})$ based on its own properties. For example, considering two pixels $p_1\in\mathbb{F}$ and $p_2\in\mathbb{P}$, $p_1$ should be more important than $p_2$ because our priority is to guarantee the image quality in $\mathbb{F}$.

\begin{figure*}
	\centering
	\begin{overpic}[width=1.\textwidth]{Figures/probability.pdf}
		\put(7,0){(a)}
		\put(27,0){(b)}
		\put(47,0){(c)}
		\put(67,0){(d)}
		\put(91,0){(e)}
	\end{overpic}
	\caption{Importance Map. (a) Visual acuity importance map. (b) Geometry silhouette importance map. (c) Indirect brightness importance map. (d) Final importance map. (e) Reference scene with foveal region (in red circle).}
	\label{fig:Probability}
\end{figure*}

\subsection{Generation of Importance Map}\label{sec:ProbabilityMap}
In our algorithm, we compute an importance value $\bm{\mathfrak{p}}\in[0,1]$ for each pixel, i.e. the weight of each sample. This value only depends on pixel's own properties, like position in image space, and has nothing to do with VPLs. \ADD{Meanwhile, the properties of a pixel that we considered are mainly based on HVS. In fact, whether it is a natural scene or a virtual scene, the final image in the HVS is two-dimensional, and the human eye can only be sensitive to a small area of an image at any given time. In other words, the theory of visual perception is determined by the structure and function of the human eye itself and does not depend on the external scene.} \REP{Inspired}{Therefore, inspired} by \citet{stengel2016adaptive}, we propose to use visual acuity, geometry silhouette and indirect brightness to compute the importance. And all the importance values form an importance map $P$ which has the same resolution as image $\mathbb{I}$.

\subsubsection{Visual Acuity}\label{sec:AcuityFallOff}
Visual acuity rates the ability to recognize small details. Many researches have proved that the acuity decays linearly within a certain range \cite{weymouth1963visual, strasburger2011peripheral}, which is often used in the foveated rendering methods \cite{stengel2016adaptive, guenter2012foveated, vaidyanathan2014coarse} to simulate the reduction of resolvable resolution away from gaze position.

In order to approximate the attenuation of acuity to eccentricity angle, we assume that the acuity falls off linearly throughout the field of view. We propose to use Eq.\ref{eq:AcuityImportance} to describe the acuity importance $\bm{\mathfrak{p}}_a(p)$ of pixel $p$, which can be applied to arbitrary gaze position $c_g$ in image space.

\begin{equation}
	\bm{\mathfrak{p}}_a(p) = {\rm clamp}(h(c_p, c_g), \omega_p, 1.0)
	\label{eq:AcuityImportance}
\end{equation}
where $c_p$ is the 2D image coordinate of pixel $p$, $\omega_p$ is a constant acuity in the outer periphery, $h(c_p, c_g)$ is employed to simulate the falloff of the acuity, which can be written as:

\begin{equation}
	h(c_p, c_g) = 1.0 +  s \cdot \left(m(r, \alpha) - e(c_p, c_g) \right)
	\label{eq:AcuityFallOff}
\end{equation}
where acuity slope $s$ is a user-defined value, $e(c_p, c_g)$ represents the eccentricity angle. The function $m(r, \alpha)$ \REP{is used to adjust the range of the foveal region}{represents the range of foveal region where the visual acuity is acute. It is} based on the ratio $r$ of foveal region to screen and radian $\alpha$ of FOV:

\begin{equation}
	m(r, \alpha) = atan \left( \REP{$\frac{\sqrt{r}}{2} \cdot \alpha$}{$\sqrt{r} \cdot tan\frac{\alpha}{2}$} \right)
	\label{eq:FovealAcuity}
\end{equation}

As for the eccentricity angle $e(c_p, c_g)$, it \REP{can be computed as:}{is the angle of the pixel $p$ away from the gaze position $c_g$, which can be evaluated as Eq.\ref{eq:EccentricityAngle}.}
\begin{equation}
	\begin{aligned}
		k(c_p, c_g) &= \frac{|| c_p - c_g ||_2}{|| \bm{\mathfrak{d}} ||_2} \\
		e(c_p, c_g) &= atan\left( k(c_p, c_g) \cdot tan\frac{\alpha}{2} \right)
	\end{aligned}
	\label{eq:EccentricityAngle}
\end{equation}
where $\bm{\mathfrak{d}} = (width /2, height / 2)$ is the half size of the display screen.\ADD{ $k(c_p, c_g)$ represents the ratio of the projection of the eccentricity angle.} The approximate falloff of the acuity is shown in Fig.\ref{fig:Probability}a.


\subsubsection{Geometry Silhouette}\label{sec:GeometrySilhouette}
A lot of studies suggest that the shape of an object can affect the perceived illumination \cite{vangorp2007influence, marlow2015material}. For instance, sphere is better than other simple shapes to interact with light, which is widely used to display the materials in many 3D engines.

In our algorithm, we make use of the G-buffer to detect the silhouette by computing the gradient of depth, the angle between normals and the color difference $\triangle c$ which is defined as the weighted Euclidean distance between the pixel color $\bm{\mathfrak{c}}$ and the neighbor color $\hat{c}$ in RGB space, as illustrated in Eq.\ref{eq:ColorDifference}, \ADD{these weights here are empirical results based on what the human eye perceives}. Once the value of one of the three channels exceeds the given threshold, we treat the pixel as geometry silhouette and set a constant value. From this, we can generate a sampling importance $\bm{\mathfrak{p}}_g$ (Fig.\ref{fig:Probability}b).

\begin{equation}
	\begin{aligned}
		\overline{r} &= \bm{\mathfrak{c}}.r - \hat{c}.r\\
		\triangle r  &= 256 * (\bm{\mathfrak{c}}.r - \hat{c}.r)\\
		\triangle g  &= 256 * (\bm{\mathfrak{c}}.g - \hat{c}.g)\\
		\triangle b  &= 256 * (\bm{\mathfrak{c}}.b - \hat{c}.b)\\
		\triangle c  &= \sqrt{(2 + \overline{r}) \cdot {\triangle r}^2 + 4 \cdot {\triangle g}^2 + (3 - \overline{r}) \cdot {\triangle b} ^ 2}
	\end{aligned}
\label{eq:ColorDifference}
\end{equation}

\subsubsection{Indirect Brightness}
Brightness has a great impact on the perception of HVS and visual acuity \cite{stengel2016adaptive}. For example, HVS is more sensitive to brightness, which is widely employed in video encoding to compress the video size. In our paper, we use the indirect brightness to define importance value $\bm{\mathfrak{p}}_b$ to model this phenomena. Unfortunately, the indirect brightness is unknown when we compute $\bm{\mathfrak{p}}_b$. We use the accumulated indirect brightness $b(p)$ of history frames to calculate $\bm{\mathfrak{p}}_b$ (Fig.\ref{fig:Probability}c), as illustrated in Eq.~\ref{eq:IndirectBrightness}:
\begin{equation}
	\label{eq:IndirectBrightness}
	\bm{\mathfrak{p}}_b = min\left (1, \delta \cdot b(p) \right)
\end{equation}
where $b(p)$ is the tone-mapped brightness of pixel $p$, and $\delta$ represents the influence factor of indirect brightness.

\begin{algorithm}[htb]
	\label{al:SamplesGeneration}
	\caption{Modified Quasi-Metropolis Hasting Algorithm}
	\KwIn{ iteration number for each Markov chain $M$, two-dimensional shuffled Halton values $H$, importance map $P$, G-buffer $G$
	}
	\KwOut{ generated sample $S$ }
	$j \gets threadId;$\\
	$coord \gets H[j*M];$\\
	$w \gets P(coord);$\\
	\For{$i \gets 1$ $to$ $M$}
	{
		$coord_{new} \gets H[j*M + i];$\\
		$w_{new} \gets P(coord_{new});$\\
		\If{$w_{new} > w$}
		{
			$coord \gets coord_{new};$\\
			$w \gets w_{new};$
		}
	}
	$S \gets generateSampleFromGBuffer(G, coord, w);$\\
	\Return $S$;
\end{algorithm}


\subsection{Choosing Shading Points Based on Importance Map}\label{sec:SamplesGeneration}

Based on $\bm{\mathfrak{p}_a}, \bm{\mathfrak{p}_g}, \bm{\mathfrak{p}_b}$, the importance value for pixel $p$ is defined as:

\begin{equation}
	\label{eq:ProbabilityWeight}
	\bm{\mathfrak{p}}(p) = \max(\bm{\mathfrak{p}_a}, \bm{\mathfrak{p}_g}) \cdot \bm{\mathfrak{p}_b}
\end{equation}

After the importance map $P$ is achieved, we modify the Metropolis-Hasting (M-H) \cite{hastings1970monte} method to choose pixels from $\mathbb{I}$ according to $P$. The traditional M-H method needs to compute an ideal Markov chain and the quality of the initial state will seriously affect the convergence speed \cite{szirmay1999start}. And its performance also does not meet our real-time requirement.

Since our importance map can be treated as a kind of probability map, we extend M-H sampling strategy to fit our method. At the same time, in order to improve the sampling efficiency, we implement the sampling process on GPU. Algorithm \ref{al:SamplesGeneration} presents the pseudo code of our GPU-based sampling algorithm.

Our sampling strategy is based on $N_s$ parallel threads, where $N_s$ is the number of desired samples. To begin with, we use Halton sequence to generate $N_s * M$ 2D quasi-random numbers and shuffle them, where each 2D number can be treated as the image coordinate of a proposal sample and $M$ is the length of each Markov chain. In thread \REP{$j \in \left [0, 1 \right)$}{$j \in \left [0, N_s \right)$}, we first get the image coordinate and importance value of a sample (line 2,3) and set it as the accepted pixel. Then we perform $M - 1$ iterations (line 4 - 9). For each iteration, we also get the coordinate and importance value of a new proposal sample (line 5,6) and compare its importance value with that of the accepted sample (line 7) to choose a greater one as the new accepted sample (line 8,9). Thus, we can generate a most important sample in each thread (line 10). After all threads ended, we can generate $N_s$ samples which have low-discrepancy and low-correlation.


%-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\section{Stochastically Light Culling}\label{sec:StochasticLightCulling}
In this section, we present our method to cull VPLs based on their contributions to the final image quality. We firstly extend the Eq.\ref{eq:PixelContribution} to a more specific form, and elaborate on our visibility estimation method to accelerate visibility computation involved in $C_{l_v}$. After that, we describe our strategy to cull VPLs stochastically.

The contribution of a VPL $l_v$ to the final image quality is defined by Eq.\ref{eq:AccurateContribution}. As we have generated $N_s$ samples by Algorithm \ref{al:SamplesGeneration} according to importance weight $\bm{\mathfrak{p}}$ calculated by Eq.\ref{eq:ProbabilityWeight}, the contribution computation of $l_v$ is further extended from Eq.\ref{eq:PixelContribution} as:

\begin{equation}
	\begin{aligned}
	C(l_v \rightarrow \mathbb{R}) &= \frac{1}{N} \sum_{i=1}^{N} f(I_{l_v}, V_i, G_i, \bm{\mathfrak{p}}_i)\\
	&\approx \frac{1}{N_s}\sum_{i=1}^{N_s}\frac{I_{l_v} V_i cos(\theta_{p^\prime_i})cos(\theta_{l_v}) \bm{\mathfrak{p}}_i}{d_i^2 + \epsilon}
	\end{aligned}
	\label{eq:VPLContribution}
\end{equation}
where $G_i = \frac{cos(\theta_{p^\prime_i})cos(\theta_{l_v})}{d_i^2 + \epsilon}$ is the geometry term, $\bm{\mathfrak{p}}_i$ is the weight of sample $p^\prime_i$, and $d_i$ is the distance between $p^\prime_i$ and $l_v$. $\theta_{p^\prime_i}$ is the angle between $\vec{n}_{p^\prime_i}$ and the transmission direction, $\theta_{l_v}$ is the angle between $\vec{n}_{l_v}$ and the transmission direction, where $\vec{n}$ indicates normal vector. $\epsilon$ is a number to prevent mathematical singularities, and $V_i$ is the visibility between $p^\prime_i$ and $l_v$.

Evaluating the visibility term $V_i$ in Eq.~\ref{eq:VPLContribution} is computationally expensive even with the hardware ray tracing. In this paper, we propose BLSO method to approximate $V_i$ by local space occlusion information, which involves both VPL $l_v$ and sample $p^\prime_i$. Then the visibility $V_i$ is calculated as:

\begin{equation}
	V_i =   mix( T_{l_v}(\vec{d}), T_{p_i'}(-\vec{d}), \lambda)
	\label{eq:VisibilityInquiry}
\end{equation}
where $\vec{d}$ represents the direction from $l_v$ to $p_i'$, the function $T_{l_v}$ and $T_{p_i'}$ represent the $l_v$ and $p_i'$ visibility probability calculated by our method, ${\lambda}$ is a user-defined value. Due to the same calculation way for $T_{l_v}$ and $T_{p_i'}$, we take $T_{l_v}$ as an example to further describe the details.

As illustrated in Fig.\ref{fig:VisibilityEstimation}, we sample some random positions in the local space of the solid angle which regard $\vec{d}$ as the central axis, and then calculate the visibility probability of each sample ($A$, $B$ and $C$ as examples) one by one. For $A$, it is invisible to the light source, so we think $l_v$ may be occluded in this direction; But for $B$, it is visible to the light source, so we consider this direction has no occlusion; And $l_v$ is obvious occluded in this direction from it to $C$ because $C$ is on the surface of an object. Finally, we calculate the visibility for arbitrary sample position $s$ as:

\begin{figure}[t]
	\centering
	\begin{overpic}[scale = 0.45]{Figures/VisibilityEstimation.pdf}
	\end{overpic}
	\caption{Visibility Estimation. $l_v$ is the VPL, $n$ represents the normal, $d$ is the direction needed to estimate the visibility probability,
		and $A, B$ and $C$ represent the sampled position int local-space, the dotted red circle is the local hemisphere space.}
	\label{fig:VisibilityEstimation}
\end{figure}

\begin{equation}
	\centering
	g{(s)} = 
	\begin{cases}
		1 & {\triangle h \REP{$<$}{$\le$} - \xi}\\
		0 & \ADD{$- \xi <$} { \triangle h < \xi}\\
		1 - e^{-k \sqrt{\triangle h}} & {else}
	\end{cases}
	\label{eq:DetermineVisibilityByDepth1}
\end{equation}
where $\triangle h$ represents the difference of depths \ADD{in light source space} between $s$ and the \REP{corresponding}{sampled} value \ADD{according to the projection position of $s$} from \REP{shadow}{the depth} map, \ADD{it could be positive, negative, or zero,} $k$ is the attenuation factor, $\xi$ is a small positive threshold related to the scene.

Finally, we estimate the visibility probability of $l_v$ in direction $\vec{d}$ by accumulate the probabilities of all sampled positions as:

\begin{equation}
	T_{l_v}(\vec{d}) = \frac{1}{N^\prime} \sum_{i=1}^{N^\prime} g(i)
\end{equation}
where $N^\prime$ is the number of sampled positions. And the process of calculating $T_{p_i'}$ is the same.

After we approximate the visibility term in Eq.~\ref{eq:VPLContribution}, we can give the final contribution evaluation for $l_v$ as:
\begin{equation}
	\begin{aligned}
		C(l_v \rightarrow \mathbb{R}) &\approx \frac{1}{N_s}\sum_{i=1}^{N_s}\frac{I_{l_v} V_i cos(\theta_{p^\prime_i})cos(\theta_{l_v}) \bm{\mathfrak{p}}_i}{d_i^2 + \epsilon}\\
		&\approx \frac{1}{N_s}\sum_{i=1}^{N_s}\frac{I_{l_v} mix( T_{l_v}[\vec{d}], T_{p_i'}[-\vec{d}], \lambda) cos(\theta_{p^\prime_i})cos(\theta_{l_v}) \bm{\mathfrak{p}}_i}{d_i^2 + \epsilon}
	\end{aligned}
	\label{eq:finalVPLContribution}
\end{equation}

Based on Eq.~\ref{eq:finalVPLContribution}, we use a sampling strategy similar to \cite{ritschel2011making} to stochastically cull VPLs.

%-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\begin{figure*}[t]
	\setlength{\lineskip}{0pt}
	\raisebox{60pt}{\makebox[10pt]{\rotatebox[origin=c]{90}{\textbf{Sibenik}}}}
	\subfigure{
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.163]{Figures/ShadingResult/Sibenik_RT.png}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.163]{Figures/ShadingResult/Sibenik_FLC.png}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.163]{Figures/ShadingResult/Sibenik_FIR.png}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.748]{Figures/ShadingResult/Details/Sibenik_Details.pdf}
		\end{minipage}
	}
	
	\vspace{-3mm}
	\raisebox{60pt}{\makebox[10pt]{\rotatebox[origin=c]{90}{\textbf{Sponza}}}}
	\subfigure{
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.163]{Figures/ShadingResult/Sponza_RT.png}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.163]{Figures/ShadingResult/Sponza_FLC.png}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.163]{Figures/ShadingResult/Sponza_FIR.png}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\includegraphics[scale=0.748]{Figures/ShadingResult/Details/Sponza_Details.pdf}
		\end{minipage}
	}
	
	\vspace{-3mm}
	\raisebox{60pt}{\makebox[10pt]{\rotatebox[origin=c]{90}{\textbf{San Miguel}}}}
	\subfigure{
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\begin{overpic}[scale=0.163]{Figures/ShadingResult/SanMiguel_RT.png}
				\put(40,-10){TBDS}
			\end{overpic}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\begin{overpic}[scale=0.163]{Figures/ShadingResult/SanMiguel_FLC.png}
				\put(43,-10){FLC}
			\end{overpic}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\begin{overpic}[scale=0.163]{Figures/ShadingResult/SanMiguel_FIR.png}
				\put(43,-10){FIR}
			\end{overpic}
		\end{minipage}
		\begin{minipage}[b]{.241\textwidth}
			\centering
			\begin{overpic}[scale=0.748]{Figures/ShadingResult/Details/SanMiguel_Details.pdf}
				\put(17,-10){Details Comparison}
			\end{overpic}
		\end{minipage}
	}
	\caption{Render results from TBDS (column 1), our method (column 2), FIR (column 3). The details in green squares are magnified in column 4 (top: TBDS; bottom-left: FLC; bottom-right: FIR). The red circles indicate the foveal regions.}
	\label{fig:ShadingResult}
\end{figure*}

\section{Other Implementation Details}\label{sec:Implementation}
In order to further reduce shading cost, we adapt \textit{log-polar transformation} (LPT) \cite{meng2018kernel} to accelerate rendering (Fig.\ref{fig:AlgorithmOverview} S-7). However, the forward transformation strategy used in LPT may result in a many-to-one map from Cartesian space to log-polar space, causing some pixels in log-polar space cannot get corresponding values. We use the backward strategy to transform the textures in G-Buffer based on their inverse log-polar transformation, which can be described as taking the corresponding values from Cartesian space to log-polar space.

During the shading process, we also perform \textit{interleaved sampling} method \cite{keller2001interleaved} in log-polar space to improve efficiency. As a result, we have to use a Gaussian filter with a 3 $\times$ 3 kernel in the whole space to reduce artifacts. Moreover, due to its inherent problems of LPT, there will be flickering in peripheral regions after the inverse transformation from log-polar space to Cartesian space. In order to alleviate flickers, we apply \textit{temporal anti-aliasing} (TAA) with different clamp size $\eta$ for different pixel\REP{:}{(Eq.\ref{eq:TAA}), which is because the region farther away from the fovea has a higher frequency of flicker due to downsampling.}

\begin{equation}
	\eta=3+2 \times\left\lfloor\frac{\frac{\left\|x^{\prime}, y^{\prime}\right\|_{2}}{e^{L}}-0.10}{0.05}\right\rfloor
	\label{eq:TAA}
\end{equation}
where $x^{\prime} = c_p.x - c_g.x$ and $y^{\prime} = c_p.y - c_g.y$. $L$ is the maximum distance between gaze position $c_g$ and one of the four screen corners. \ADD{The several constants here are set to determine the range and change rate of $\eta$, which affects the quality and efficiency of TAA.}


%------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
\section{Experimental Results}\label{sec:Result}
The method proposed in this paper has been implemented on a PC with RTX 2080 Ti GPU, i9-9900K CPU and 16GB memory. We also employ a HTC Vive equipped with binocular eye-tracking as the display device. We compare our method with \textit{Foveated Instant Radiosity} (FIR) \cite{wang2020foveated} and \textit{Tile-Based Deferred Shading} (TBDS) \cite{andersson2009parallel} in quality and performance. Meanwhile, we use the results of TBDS as the ground truth images because it takes all the VPLs who have contributions to current tile into consideration. RMSE (Root Mean Square Error) is employed to quantitatively measure the differences between the rendered and ground truth images.

All the results are rendered at 1024 $\times$ 1024 resolution, and the resolution of RSMs is 64 $\times$ 64. Other parameters are set as follows: the radius of foveal region is 150 pixels, the outer periphery acuity $\omega_p = 0.2$ (Eq.\ref{eq:AcuityImportance}), the acuity slope $s = 2.25$ (Eq.\ref{eq:AcuityFallOff}), the indirect brightness influence factor $\delta = 20$ (Eq.\ref{eq:IndirectBrightness}), the mixing coefficient $\lambda = 0.5$ (Eq.\ref{eq:VisibilityInquiry}), and attenuation factor $k = 34$ (Eq.\ref{eq:DetermineVisibilityByDepth1}).

Fig.~\ref{fig:ShadingResult} illustrates the images generated by TBDS(column 1), FLC (column 2) and FIR (column 3) for different scenes, where the foveal regions are marked by red circles. We also enlarge the regions marked by green rectangles to compare the details (last column of Fig.~\ref{fig:ShadingResult}). Additionally, RMSEs of the same foveal regions indicated in Fig.~\ref{fig:ShadingResult} are as shown in Fig.\ref{fig:RMSE}. The results demonstrated in Fig.\ref{fig:ShadingResult} and Fig.\ref{fig:RMSE} suggest the following facts:

\begin{itemize}
  \item Both Fig.\ref{fig:ShadingResult} and Fig.\ref{fig:RMSE} indicate that our method produces better image quality in the foveal region than FIR. This is because the importance computation in FIR ignores the intensities of VPLs. For this reason, even though the VPLs used for shading are close to the foveal region, many of them cannot provide sufficient indirect illumination due to low intensity.
  \item Comparing the color of wall in the first row of Fig.\ref{fig:ShadingResult}, we can find that the color bleeding effects generated by our method is better than FIR. This is because the VPLs of our method are generated from RSMs, which provide the single-bounce indirect illumination. On the contrary, the VPLs of FIR are generated from camera so that they are multi-bounced and their intensities are very low. Thus, the color bleeding effects generated by FIR are poor.
  \item Our method generates more blurry image in the peripheral regions than FIR, which is more in line with the principle of foveated rendering. The major reason causing the blur effects in the peripheral regions is our strategy of conducting shading in log-polar space.
\end{itemize}

\begin{figure}[t]
	\centering
	\begin{overpic}[scale=0.5]{Figures/RMSE.pdf}
		\put(10,-2){Sibenik}
		\put(39,-2){Sponza}
		\put(66,-2){San Miguel}
		\put(-1,15) {\rotatebox{90}{FIR}}
		\put(-1,42) {\rotatebox{90}{FLC}}
	\end{overpic}
	\caption{Visualization of RMSE compared to ground truth for the images rendered with our FLC (row 1) and FIR (row 2) in different scenes. The red circles represent the foveal regions. The value at the bottom-right in every image represents the RMSE in the foveal region.}
	\label{fig:RMSE}
\end{figure}

\begin{table}[hbt]
	\setlength{\abovecaptionskip}{0.cm}
	\setlength{\belowcaptionskip}{0.1cm}
	\caption{Comparison of RMSE in different scenes. }
	\centering
	\small
	\setlength{\tabcolsep}{0.8mm}{
		\begin{tabular}{|c|c|c|c|c|c|c|}
			\hline
			\multicolumn{1}{|c|}{\textbf{}}  & \multicolumn{2}{c|}{Sibenik} & \multicolumn{2}{c|}{Sponza} & \multicolumn{2}{c|}{SanMiguel}  \\
			\hline
			Method	& Fovea & Periphery & Fovea & Periphery & Fovea & Periphery  \\
			\hline
			FLC		& 0.03421	& 0.04394	& 0.02501	& 0.03079	& 0.04231	& 0.06199 \\
			FIR		& 0.05093	& 0.05377	& 0.04047	& 0.04473	& 0.04364	& 0.07319 \\
			\hline
		\end{tabular}
	}
	\label{tab:RMSEComparison}
\end{table}

Table \ref{tab:RMSEComparison} shows more comparisons between FLC and FIR on the RMSEs of foveal and peripheral regions for different scenes. We compute the ratio $\bm{\mathfrak{r}} = RMSE_{P}/RMSE_{F}$ to indicate how much computation power is dedicated to the image quality of foveal region. The bigger $\bm{\mathfrak{r}}$  is, the more computation resources are assigned to the foveal region. The average ratio for three scenes of our method is 1.33, which is larger than the corresponding value 1.28 of FIR. This fact proves that our method assigns more computation resources to the foveal region than FIR.

Fig.\ref{fig:TimeComparison} compares the rendering performance between our method with FIR and TBDS. It can been seen that our method is slightly faster than FIR. And comparing with TBDS, our method has a huge performance advantage because TBDS actually uses all valid VPLs in the shading computation which is too time-consuming.

\begin{figure}[htb]
	\centering
	\includegraphics[scale=0.6]{Figures/TimeComparison.pdf}
	\caption{The total render time of different methods in different scenes. }
	\label{fig:TimeComparison}
\end{figure}

As illustrated in Table\ref{tab:RMSEComparison} and Fig.\ref{fig:TimeComparison}, our method is better in both quality and efficiency. This is because we employ more precise contribution estimation method to cull VPLs and adapt BLSO to approximate visibility. Moreover, our method can generate similar results to TBDS and achieves real-time frame rates.

\begin{table}[hbt]
	\setlength{\abovecaptionskip}{0.cm}
	\setlength{\belowcaptionskip}{0.1cm}
	\caption{Computation time for every pass in different scenes of FLC.}
	\centering
	\small
	\begin{tabular}{|c|c|c|c|c|}
		\hline
		Render Passes (ms)  & Sibenik  & Sponza  & SanMiguel\\
		\hline
		G-buffer		    & 0.340    & 0.466   & 1.032 \\
		RSMs				& 0.669    & 0.847   & 1.375\\
		\textbf{Importance Sampling}					
		& 0.313    & 0.382  & 0.391\\
		\textbf{Contribution Estimating}				
		& 7.089    & 6.454  & 6.922\\
		Light Culling		& 0.014    & 0.014   & 0.014\\
		Shading				& 12.367   & 13.051  & 12.195\\
		Post-Antialiasing	& 5.087    & 5.646   & 5.454\\
		\hline
		\rule{0pt}{10pt}
		\textbf{Total}      & \textbf{25.879} & \textbf{26.860} & \textbf{27.383}  \\
		\hline
	\end{tabular}
	\label{tab:timing}
\end{table}

We also measure the performance details of our method for different scenes. The results are shown in Table.\ref{tab:timing}. It can \REP{been}{be} seen that the three most expensive stages are: contribution estimating, shading and post-antialiasing. For the contribution estimating stage, it needs to evaluate Eq.~\ref{eq:finalVPLContribution} for all VPLs, including the visibility computations. As for the post-antialiasing stage, the high computation cost is because of \textit{Gaussian Blur} and TAA.

\begin{figure}[htb]
	\centering
	\includegraphics[scale=0.6]{Figures/VPLContrast.pdf}
	\caption{Comparison of total render time and RMSE between our method with and without BLSO under different VPL numbers in Sponza scene.}
	\label{fig:VPLContrast}
\end{figure}

\begin{figure}[htb]
	\centering
	\includegraphics[scale=0.6]{Figures/UserStudy.pdf}
	\caption{The percentage of participants' visual perception that our method succeeds or fails to confuse.}
	\label{fig:UserStudy}
\end{figure}

\begin{figure}[htb]
	\centering
	\begin{overpic}[scale=0.55]{Figures/Limitation.pdf}
		\put(22,-1){(a)}
		\put(70,-1){(b)}
	\end{overpic}
	\caption{Comparison of total render time and RMSEs in Living Room scene. (a) is our method and (b) is our method without any shading acceleration algorithms. The values at the bottom-left are RMSE in the foveal region, and the values at the bottom-right are the RMSE in the peripheral region.}
	\label{fig:Limitation}
\end{figure}

Furthermore, in order to analyze the impact of the number of VPLs remained to participate shading on the rendering performance and quality of our method, we use multiple configuration of this number from 256 to 4096 for the Sponza scene. The results are shown in Fig.\ref{fig:VPLContrast}, which indicate that the rendering cost increases almost quadratically with the number of VPLs. However, the RMSEs decrease slowly when the number of VPLs increases from 256 to 1024. Moreover, the RMSEs decrease sharply between 1024 and 2048 and then stabilize again. Therefore, in order to balance the foveal quality and performance, a number between 1024 and 2048 is a proper choice for shading. As illustrated in Fig.\ref{fig:VPLContrast}, turning off our BLSO strategy will greatly increase the RMSEs. Our BLSO method can significantly improve the foveal quality without hurting the performance too much.

Meanwhile, we also compare the rendering efficiency and RMSEs of our method with and without LPT and interleaved sampling, which is illustrated in Fig.\ref{fig:Limitation}. The results show that even though LPT and interleaved sampling can greatly improve our shading performance, the RMSEs of our method are mainly caused by them, which will produce a lot of aliasing at geometry silhouette due to downsampling and \textit{Gaussian Blur}.

Finally, we have invited 21 participants to watch the image sequences generated by our method with HMD. Since different people's visual perception is different, we first ask these participants to set some properties for themselves, such as foveal size and parameters of \cite{meng2018kernel}. Then we use these properties to render image sequences by our method and TBDS in the four scenes. The image sequences contain different contents, for example C1, C2, C3 and C4 represent static scenes, scenes with dynamic light source, scenes with dynamic object and scenes with dynamic light source and object respectively in Fig.\ref{fig:UserStudy}. At last, participants are asked to watch the image sequences of the same contents generated by different methods and to identify which method is used and where is the difference. For each scene, we have at least six trials. The percentage of success means that our method successfully confuses the judgment of the participants (Fig.\ref{fig:UserStudy}, blue bar). In other cases, the participants correctly point out the differences expressed as the percentage of failure in our method (Fig.\ref{fig:UserStudy}, orange bar).

For our study, participants are better able to distinguish the differences in static scenes, but the percentage of failure accounts for the majority in dynamic scenes. This is because the dynamic factors in the scenes will affect the perception of the human eyes, resulting in its insufficient sensitivity to details.


\section{Conclusion and Future Work}\label{sec:Discussion}
This paper presents a method to generate single-bounce diffuse-to-diffuse indirect illumination for foveated rendering by efficiently culling the VPLs based on their contributions to the visual perception. We propose a visually importance-guide sampling strategy to precisely calculate the VPLs' contributions and an efficient estimation method to approximate the local visibility between the VPLs and pixel samples. Even though we can produce realistic global illumination in the foveal region, there are still some limitations to be addressed.

\textbf{Global Visibility}. Since our BLSO method is based on local space, it will cause false visibility in some cases. What's more, we can conduct a more precise method to approximate the visibility, which can be used both in contribution estimating and shading. So one of the most important goals in the future is to improve the accuracy of visibility evaluating in global space.

\textbf{Multi-bounce indirect illumination and glossy materials}. In this paper, we only study the single-bounce diffuse-to-diffuse indirect illumination. Though it is enough to generate a good global illumination in most cases, it will be dark at certain positions. Moreover, in the real life, there are many glossy material such as glass, mirror and metal products. So we will try to extend this method to support glossy materials in the future.

\textbf{Efficiency}. Although our method can reach the 45 FPS, it is till slower than the required frame rate of VR applications. Therefore, we have to further improve the performance of our method.

%%Vancouver style references.
\bibliographystyle{cag-num-names}
\bibliography{refs}

\end{document}

%%
